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Abstract 



We seek to find normative criteria of ad- 
equacy for nonmonotonic logic similar to 
the criterion of validity for deductive logic. 
Rather than stipulating that the conclusion 
of an inference be true in all models in which 
the premises are true, we require that the 
conclusion of a nonmonotonic inference be 
true in "almost all" models of a certain sort 
in which the premises are true. This "cer- 
tain sort" specification picks out the models 
that are relevant to the inference, taking into 
account factors such as specificity and vague- 
ness, and previous inferences. The frequen- 
cies characterizing the relevant models reflect 
known frequencies in our actual world. The 
criteria of adequacy for a default inference 
can be extended by thresholding to criteria of 
adequacy for an extension. We show that this 
avoids the implausibilities that might other- 
wise result from the chaining of default in- 
ferences. The model proportions, when con- 
strued in terms of frequencies, provide a veri- 
fiable grounding of default rules, and can be- 
come the basis for generating default rules 
from statistics. 

Keywords: probability, frequency, default 
logic 



1 Introduction 

Non-monotonic reasoning, for example default 
logic | Rcitcr, 198Cfl , models the intuitive process 
of making non-deductive inferences in the face of 
certain supportive but not conclusive evidence. Given 
a default theory A = (D,F), we can obtain its 



extensions by following a prescribed set of steps. 
However, on what grounds do we employ a particular 
default rule? Some writers would regard this as 
an inappropriate question, since they take as their 
goal the representation of human inference. To 
this end defaults represent rules that we take to be 
intuitively appropriate. But then when we apply 
these rules, we may be led to counterintuitive re- 



sults [Rcitcr and Criscuolo, 1981 , Lukaszewicz, 1988, 
Poole, 19891 - The underlying principle seems to be 
circular: the original default rules are "intuitively 
good" at first glance, but when we discover that 
they do not give rise to the desired results, we tweak 
the rules until they give us those results. It seems 
that we have to know what results we want first 
before constructing the default theory, rather than 
having the default theory tell us what conclusions are 
warranted. This is precisely the reason why we need 
an independent measure of validity for default rules 
and default extensions. We think of nonmonotonic 
logic as sharing the normative character of other 
logics. From this point of view default rules require 
some defense. We will concentrate on default logic 
here, though much of what we have to say will apply 
to other nonmonotonic approaches as well. Much of 
the work on nonmonotonic logic has concerned the 
syntactic manipulation of the nonmonotonic rules, 
rather than their basic justification. 

1.1 Selective Preference 

For a default rule d = ; a is the prerequisite, 

Pi, . . . , p n are the justifications, and 7 is the conse- 
quent of d. Loosely speaking, the rule conveys the 
idea that if a is provable, and ~>Pi, . ■ . , ->p n are each 
not provable, then by default we conclude that 7 is 
true. A default theory is an ordered pair (D, F), where 
D is a set of default rules and F is a set of "facts" . A 
theory extended from F by applying the default rules 
in D is known as an extension of the default theory. 



Consider the following canonical example. 

Example 1 We have a default theory A 
where 



(D,F), 



D 
F 



_ f R{x):T(x) S(x):-.T(x) ^ 
~ L T{x) ' ->T(x) J> 

= {#(a),S(a)}. 



We get two extensions, one containing T(a) and the 
other containing -iT(a). If we take "R(x)" to mean 
that x is a bird, "S'(x)" to mean that a; is a penguin, 
and "T(x)" to mean that x flies, then we would like 
to reject the extension containing "T(a)" (a flies) in 
favor of the extension containing "-iT(a)" (a does not 
fly). However, if we take "S'(x)" to mean that x is 
an animal instead and keep u R(x) n and "T(x)" the 
same, we would want to reverse our preference. Now 
the extension containing "T(a)" (a flies) seems better. 
□ 

Note that each of the default rules involved in the 
example above is intuitively appealing when viewed 
by itself against our background knowledge: birds fly; 
penguins do not fly; and animals in general do not fly 
either. Moreover, both instantiations (penguins and 
animals) are syntactically identical. Thus, we cannot 
base our decision to prefer one default rule over the 
other by simply looking at their syntactic structures. 

It is the interaction between the default rules and evi- 
dence that gives rise to the selective preference above. 
We have the evidence that "a is a bird" . If in addition 
we also have "a is a penguin", we prefer the penguin 
rule. If instead we have "a is an animal" , we prefer 
the bird rule. 

There are several approaches to circumventing 
this conceptual difficulty. The first is to revise 
the defa ult theory so that the d esired result is 
achieved [ Reiter and Criscuolo, 1981 . We can amend 



the default rules by adding the exceptions as justifica- 
tions, for example B ( x )- F ^)^ p ( x ) anc j A(x).^f^x).-^b(x) ^ 

With this approach we have to constantly revise the 
default rules to take into account additional excep- 
tions. We have little guidance in constructing the list 
of justifications except that the resulting default rule 
has to produce the "right" answer in the given situa- 
tion. 

Another approach is to establish some priority struc- 
ture over the set of defaults. For example, we can 
refer to a specificity or inheritance hierarchy to de- 
termine w hich default rule should be used in case of 
a conflict [ fTourctzky, 1984 [Horty et al, 1987fl . The 
penguin rule is more specific than the bird rule, when 
both are applicable, and therefore we use the pen- 



guin rule and not the bird rule. However, conflict- 
ing rules do not always fit into neat hierarchies (for 
example, adults are employed, students are not, how 



about adult students? [Reiter and Criscuolo, 1981]). 
It is not obvious how we can extend the hierar- 
chical structure without resorting to explicitly enu- 
merating the priority relations between the default 



rules [Brewka, 1989, Brewka, 1994 1 



The third approach is to appeal to probabilistic analy- 
sis. Defaults are interpreted as representing properties 
of conditional probabilities. For example, the condi- 
tional probabilit y of a being able to fly gi ven that a is 



a bird is "high" pearl, 1988j |Pearl, 199C ] 



from the prior probability [ Neufeld et ai, 1990 



or increases 
while 



the conditional probability of a being able to fly given 
that a is a penguin is "low" or decreases. This ap- 
proach provides a probabilistic semantics for default 
rules, but in a way which does not represent the fact 
that the conclusions are accepted. The default con- 
clusion is "Tweety flies," not "Probably Tweety flies." 
This is in contrast to the spirit of nonmonotonic rea- 
soning: default conclusions should be accepted as new 
facts, and we should be able to chain default rules and 
build upon the conclusions of previous default appli- 
cations to obtain further conclusions. 

1.2 Justifying Nonmonotonic Inference 

The justification of beliefs is a long standing is- 
sue in epistemology. There is not much that is 
problematic about the justification of beliefs ob- 
tained by deductive inference (though there are 
plenty of problems that surround deduction — see 
Kyburg, 1958] |Haack, 197(| [Dummett, 1978fl , not to 
mention the voluminous literature on paraconsistent 
logic [Priest, 1989 1 ) . The reason is that we can show 
that the ordinary rules of deduction lead from premises 
to conclusions that are true in every model in which 
the premises are true. This is exactly what is not true 
of ampliative inference, and it is what has led some 
writers (e.g., Morgan, 1998| ) to deny that there is any 
such thin g as a nonmono tonic logic. This has been dis- 
puted in jKyburg, 2001 [. 



But other kinds of justific ations of b eliefs have 
been prop osed. Isaac Levi | Levi, 1967 , Levi, 198C , 



Levi, 1996] has argued for many years that the way 
to understand ampliative (inductive, nonmonotonic) 
argument is in terms of decision theory: we choose 
(decide) to accept a hypothesis in a given context pro- 
vided that the expected epistemic utility of doing so in 
that context is greater than the expected utility of any 
other epistemic act, such as suspending belief totally, 
or accepting a stronger hypothesis. 



Levi's approach employs a rich and detailed struc- 
ture for acceptance, and allows drawing many impor- 
tant distinctions. This structure requires three things 
that make it less than perfect as a vehicle for ordi- 
nary nonmonotonic inferential systems. First, in keep- 



ing with a long tradition in pragmatism [Dewey, 1938, 
Pcirce, 1903, James, 1948 the context of inquiry must 



be tied to a specific problem: We need the answer to 
a question. Second, the epistemic expectation of an 
answer is the expected value of the information con- 
tained in that answer. Thus we need to presuppose an 
information measure on the language of our inquiry 
[ Levi, 1996| , p. 169]. Third, we need to have available 
a credal or inductive probability, based on a measure 
(or convex set of measures) on the sentences of the lan- 
guage, in terms of which a conditional probability (or 
convex set of conditional probabilities) can be defined 
[ Levi, 1980] , p. 52]. 

It is our belief both that in some contexts in which 
we might wish to use nonmonotonic mechanisms, this 
overhead is unnecessary, and perhaps itself difficult to 
justify, and that we would like to be able to explicate 
the justification of inference in a less context depen- 
dent way. 

Another approach that has attracted considerable 
attention in the philosophical community in recent 
years is that of "reliabilism" whose best known ex- 
ponent is Alvin Goldman [Goldman, 1979 . Accord- 
ing to this view, what justifies a belief is the fact 
that it is obtain ed by a "reliable cognitive process..." 
Goldman, 197E , p. 20] Of course there are a num- 



ber of additional hedges to the view that are required 
for philosophical accuracy, and even with those hedges 
there remains a certain vagueness in the view. These 
details need not detain us, since we are seeking inspi- 
ration rather than philosophical precision. 

What does "reliable" mean? We will construe reliabil- 
ity in terms of frequency or propensity to yield truth 
when applied. Specifically, we will say that the be- 
lief <p is nonmonotonically justified by a default rule if 
the rule would frequently lead to truth and rarely to 
error, given what we know — given our background 
knowledge. 

A deductive argument is justified (valid) if its conclu- 
sion is true in every model of its premises. We will 
attempt to provide an analog of the justification of 
deductive rules: a default argument is justified if its 
conclusion is true in a high proportion of the relevant 
models in which its premises are true. To make this 
idea precise requires an excursion into model theory. 



2 Model Theory 

We will suppose that the underlying object language is 
a first order language that does not involve such inten- 
sional predicates as "know" or "believe." A number of 
nonmonotonic formalisms (specifically autoepistemic 
logic [Moore, 1985]) do involve such locutions within 
the object language, but they can be dispensed with 
in default logic. The default rule a: ^ 1 ^"'^" can be read 
in terms of the nonmembership of r -<pP in a specified 
set of expressions T. In original default logic, T would 
just be an extension. 

There are a number of immediate problems associated 
with the idea of looking at the "proportion" of mod- 
els. The least of them is choosing a level at which to 
regard the evidence as adequate. Should we require 
that the proportion be 0.95? Or 0.99? Or 0.995? This 
is just the sort of question that arises in statistical hy- 
pothesis testing or in confidence interval estimation. 
We shall suppose that in a given context there is some 
agreed-upon level of security 5; we will accept a con- 
clusion if the proportion of models in which we could 
be committing an error is no greater than 5. 

This approach is to be contra sted wit h those of Ad ams 
Adams, 1975| |Adams, 1966]], Pearl JPearl, 1988] and 



Bacchus et al [Bacchus et al, 1993 1. Adams requires 
that for A to be a reasonable consequence of the set 
of sentences S, for any e there must be a positive 5 
such that for every probability function, if the proba- 
bility of every sentence in 5* is greater than 1 — S, then 
the probability of A is at least 1 — e [Adams, 1966, 
p. 274]. Pearl's approach similarly involves quantifi- 
cation over possible probability functions. Bacchus et 
al again take the degree of belief of a statement to be 
the limiting proportion of first order models in which 
the statement is true. All of these approaches involve 
matters that go well beyond what we may reasonably 
suppose to be available to us as empirical enquirers. 
Our 5, on the other hand, serves much like the a of 
statistical testing. 

We must restrict the number of models under consid- 
eration to a finite number so that the idea of looking at 
proportions makes sense.Q We will be taking account 
of statistical information, and to this end will want 
each model to have a finite domain. Roughly speaking, 
we take as a model of our language one in which the 
domain of empirical individuals is of finite cardinality. 
This may be regarded as problematic (it entails the 



We could, instead, seek to develop a way of proceeding 
to a limit; this still would require restrictions to arrive at 
a countable number of models, and would entail a large 
expository cost for little gain in plausibility. 



falsity of "every person has two parents and nobody 
is his own ancestor") but with reasonable spatial and 
temporal bounding it can be rendered plausible. 

Even so, to ensure that the set of models is finite we 
must restrict the empirical domain even further. Not 
only must it consist of a finite set of physical entities, 
but this same set of physical entities V must be taken 
to be the empirical domain of every model. 

We assume that it is possible to express statisti- 
cal knowledge in this language. For example, if 
u B(x)" is the predicate "is a bird" and "F(x)" 
is "can fly" , we can express the fact that be- 
tween 85% and 95% of birds fly by the formula 
0.85 < K jS^ )}l < 0-95. E mploying the no- 
tation of fKyburg and Teng, 2001 we write this as 
"%x(F(x),B(x), 0.85, 0.95)." This renders "%" a vari- 
able binding operator on 4-sequences of expressions: 
two formulas and two fractions. 

We distinguish, as do Pearl and Geffher 
Pearl and Geffher, 1990, p. 70] between imme- 



diate evidence, represented by a finite set of sentences 
E concerning particular facts (to be distinguished 
from the general body of factual knowledge F invoked 
by classical default logic), and a finitely axiomatizable 
set of sentences K representing general background 
knowledge. What defaults are plausible depends, 
of course, on background knowledge. If it were not 
for what we take to be the typical (or natural, or 
frequent) behavior of birds, the world's best known 
example of a default rule would not be plausible. On 
the other hand no one has proposed the default rule 

fish(a):mackcrcl(a) 
can— talk(a) 

Thus in general we will represent the set of default 
rules of a default theory as Ak rather than D, since 
we take them to be a function of our body of general 
knowledge K. Given an error tolerance (5, we will take 
a default rule to be 8-valid if, for every set of possible 
input sentences E consistent with K , the application of 
the rule to E leads to a false conclusion in a proportion 
of at most 8 of the relevant models. More precisely, a 
default rule is (5-valid if and only if for every set of 
input sentences E consistent with K to which the rule 
is applicable, the proportion of models of E U K in 
which the conclusion of the rule is false is no more 
than <5. 

To fix our ideas, let us begin with a simple example. 
Suppose K includes a statement to the effect that at 
least 1 — 8 and not more than 1 - e of birds fly and noth- 
ing else; that is, u %x(F(x), B(x), 1 — 8, 1 — e)." Con- 
sider the rule j^ffl^ ■ This rule is "applicable" to 
immediate evidence E only if E U K entails a sentence 



of the form r B(a)~ l and no corresponding sentence of 
the form r -iF(a) n . 

Our models have a single domain T> of finite cardinal- 
ity. We will write "X m (0)" for the interpretation of <f> 
in the model m. The constraint imposed by K is that 
for every model m the proportion of objects in T m (B) 
that are also in T m (F) lies in [1 — 8, 1 — e]. 

There are three cases. First, suppose E U K does not 
entail a sentence of the form r B(a)~ l . Then the rule 
is inapplicable. Second, suppose that for some term 
a, E U K entails ii B(a)" and also entails u ^F(a)" . 
The rule is again inapplicable, because it is blocked 
by the failure of a justification. Third, suppose for 
some term a, E U K entails "5(a)" but not u -iF(a) n . 
Then X m (a) G X m (B). There are |I m (-B)| interpreta- 
tions of a that make EUK true; of these at least 1 — 8 
make "F(a)" true. We have said nothing about in- 
terpreting the rest of the language, but however many 
interpretations there are (we have seen to it that there 
are only a finite number) the proportion that renders 
"F(a)" true will remain unchanged; it will be at least 
1 — 8. Thus, given the background knowledge that we 
have posited, the rule is (5-valid: if it is applicable it 
will lead to error no more than 1 — 8 of the time. 

Now let us consider a somewhat more complex exam- 
ple: Suppose we know that typically birds fly, and that 
typically penguins don't. If that is in our background 
knowledge K, as well as l Vx(P(x) D B(x))" , then 
the flying default becomes B ( x )- F ( x ^ p ( x ) ; anc [ we a i so 

have the default £&h2|M. If E entails "P(a)", only 
the second default is applicable. In no more than 8 of 
the models of E will a fly, unless EUK entails that a 
can fly. 

Another example: Suppose K contains vague informa- 
tion about the frequency with which red birds fly (per- 
haps because we have encountered few red birds) . Say 
that we know the frequency to be between 0.50 and 1.0. 
Since the interval for birds in general [1 — 8, 1 — e] is in- 
cluded in [0.5, 1.0], this additional piece of information 
should not interfere with our inference of flying abil- 
ity. There is no conflict between the two intervals, just 
less precision in one. The rule about birds in general 
can be applied to red birds. However, if K contains 
the knowledge that between 0.5 and r of red birds fly, 
where r is less than 1 — e, then this information should 
interfere. In this case the general rule should be so 
construed that it does not apply to red birds. If b is a 
red bird and not a penguin, no conclusion about flying 
ability is justified. 

We can arrange this by judiciously adding or deleting 
justifications in the general bird rule, in accordance 



with the statistical information in K: in the first case 
we allow red birds; in the second we must require that 
we do not know a is red: "— >J?(a;)" must be among 
the justifications of the rule. This statistical approach 
provides exactly the normative guidance that is lacking 
in the ad-hoc approach of tweaking default rules in 
order to arrive at the "intuitive" results. 

More generally, we can give recipes for constructing 
(5-valid defaults for conclusions of the form r 4>(a)~ l 
from background knowledge K and immediate evi- 
dence Let K contain r %x((j)(x), ip(x),p, q) n and 
r %x((j>(x), ip'(x),p', q')~ l . We consider three cases: 

1. K entails r \/x(ip(x) D ip'(x))~ l . There are three 
subcases according to the relation among p, p', q, 
q>: 

( a ) (p < p' and q < q') or (p' < p and q' < q) 
fMllM and tlsk^MME) are candidate 
defaults. 

(b) p < p' and q' < q 



ip' (x):<j)(x) 
4,{x) 



is the only candidate default, since 

the justification -iip'(x) of ^^'^^^ ^ is 
inconsistent with the prerequisite ip(x) and 
K. 

(c) p' < p and q < q' 

and tM^MM are candidate 

defaults. 

2. K entails r \/x(ifj'(x) D ijj(x))^. This is symmetri- 
cal to case 1. 

3. K entails neither r yx(tp(x) D ip'(x))^ nor 
r Vx(ip'(x) D ^(x)) n . Again there are three sub- 
cases: 

(a) (p < p' and q < q') or (p' < p and q' < q) 
The candidate defaults are ^( a: )-^)'~"/' ( z ) 

„ nc J lp'{x):^ip(x)^(x) 
tj>(x) 

(b) p < p' and q' < q 
tisMsh^M an d tMlgEl are candidate 



default rules. 



4>(x) 



(c) p' < p and q < q': This is symmetrical to 
case 3(b). 



2 Of course any default conclusion can be given this 
form, particularly if we allow the term a to be an n- 
sequence of terms. Furthermore, any such conclusion can 
be taken to be an instance of the consequent of a statis- 
tical generalization, in virtue of the fact that statistical 
generalizations merely impose bounds. We are not impos- 
ing serious limitations c m default rules. For details, see 
[Kyburg and Teng, 2001], 



Having generated a list of candidate default rules 
based on our background knowledge K, we delete those 
rules derived from statistics with a lower measure less 
than 1 — S. The remainder is the set of defaults Ak- 

We have not taken account of relations among default 
conclusions that may be entailed by K. If K con- 
tains r Wx(cj)(x) = (j)'(x))~ l then the default conclusion 
r <fi(a)~ l behaves just like the default conclusion r <fi'(a)~ l . 
If K contains r \/x(cj)(x) D ^'(x)) -1 , then since r 4>{x)~ l 
is equivalent to r <j){x) A (f>'{x) n and r 4>'(x)~ l is equiva- 
lent to r <p(x) V <f)'(x)^ we can make use of the obvious 
entailment relations. 

Soundness of a system of deductive logic requires that 
the conclusion of any inference be true in every model 
in which the premises are true. Clearly nonmonotonic 
inference should not be sound. But there is a property 
that is like soundness that applies to default inference. 
It is the property that the conclusion is false in at most 
a fraction S of the models of the premises K U E. 

Theorem 1 (Default Soundness) For every set of 
observations E, if d £ A^- is applicable to E , the pro- 
portion of models of E U K in which the conclusion of 
d is false is less than 5. 



The proof 
soundness 



of this theorem is provided by the 
theorem for evidential probability 
Kyburg and Teng, 2001, p. 241], since the rules 
for deriving defaults are a subset of the rules for 
computing evidential probabilities. □ 



3 Interactions within an Extension 

Having determined which default rules are justified 
with respect to the background knowledge, the next 
step is to investigate the interaction between default 
rules in generating an extension. A default extension 
is a minimal deductively closed set that contains the 
given facts and the consequents of all applicable de- 
fault rules. Given an evidence set, we need to deter- 
mine how to control the compound effects of multiple 
defaults in an extension. 



Take for example, a default ver sion | Poole, 198E | of the 
probabilistic lottery paradox [Kyburg, 1961 . There 
are n species of birds, S\, . . . ,S n . We can say that 
penguins are atypical in that they cannot fly; hum- 
mingbirds are atypical in that they have very fine mo- 
tor control; parrots are atypical in that they could talk; 
and so on. If we apply this train of thought to all n 
species of birds, there is no typical bird left, as for each 
species there is always at least one aspect in which it 
is atypical. A parallel scenario is formulated below. 



Example 2 K contains 

B{ x ) = s 1 {x)y ...y s n {x) 

[an exhaustive list of bird species] 
Si(x) D ->Sj(x), for all j ^ i 

[species are mutually exclusive] 
%(S i (x),B(x),e i ,S l ),br 1 < i < n 

[the proportion of each Si species of birds 
is "small"] 

From K we can derive n £*-valid default rules for Ak- 

B(x) : ^Si(x) „ 
dk = V , H for 1 < i < n 
->bi{x) 

where 6* is the maximum of Si, . . . , S n . 

Now consider the evidence set E = {B(a)}. In the 
original formulation of default logic, we would get n ex- 
tensions, each one containing one Si (a) and the nega- 
tions of all the other Sj(a)'s. Thus, for each extension, 
we would conclude that a is a particular species of bird, 
which seems to be an over commitment, considering we 
have %(S i {x),B{x),e l ,5 l ) in K. □ 

Note that each of the n default rules is <5-valid when 
considered individually, but in an extension the rules 
interact to sanction a set of conclusions that when 
taken together seems implausible according to our 
knowledge of model frequencies. The definition of an 
extension dictates that we must keep applying rules 
until all "applicable" ones are exhausted. The "ap- 
plicability" condition is based on maximizing logical 
strength: for d = a: P 1 '—'P™ ; as long as a is derivable, 
and the /3's are consistent with the extension, we must 
apply d and add 7 to the extension. Thus, for each 
of the extensions above, we have to keep applying the 
rules until we have drawn n — 1 conclusions: ->Sj (a) 
for all j ^ i. Then the consistency requirement blocks 
the last default rule di, as B{x) D Si(x) V . . . V S n (x) 
together with ->Sj(a) for all j i gives us 5j(a), con- 
tradicting the P of di. From %(Si(x), B(x), €i, Si), we 
know the proportion of models in which Si(a) is true, 
and thus the proportion of models satisfying this ex- 
tension, given E, is at most Si, a small ratio. 

3.1 Sequential Thresholding 

The validity criteria for individual default rules can be 
extended to extensions resulted from the application 
of a chain of default rules. We can think of the task 
of regulating the compound effect of multiple default 
rules as adjusting the set of relevant models by taking 
into account the default conclusions of all previously 
applied rules in the chain of reasoning. 



One way to accomplish this is by sequential thresh- 
olding [Teng, 1997 . The applicability condition of a 

default rule m an extension can be modified 

7 

to take into account the validity of the rule. In addi- 
tion to requiring that a is provable and that none of 
->0i, . . . , ->p n are provable, we require that the default 
rule be "above threshold", that is, the proportion of 
relevant models satisfying the consequent 7 be greater 
than a threshold 1 — e* . 

The set of relevant models shrinks in a stepwise fash- 
ion. We start out with all the models satisfying the 
background knowledge and evidence we have. As de- 
fault rules are applied sequentially, the consequent of 
the applied rule at each step is taken as true in all 
subsequent steps. The relevant models at a particu- 
lar step are then those that are consistent with the 
given facts and all the consequents of the rules applied 
in the previous steps. A default rule, even if it is 5- 
valid with respect to the background knowledge, would 
be blocked from application if it does not satisfy the 
thresholding criterion. 



In [Teng, 1997], the thresholding metric is based on a 
simple probability measure of possible worlds. We can 
easily extend this metric to employ the same measure 
as that used for evaluating the ^-validity of default 
rules. 

Example 3 Reconsider Example |[ Let us take e* > 
5* . We start out with the set M of all models satis- 
fying K and E. From %{Si{x), B(x), e\, Si) we know 
that di is above threshold, and it satisfies the other 
conditions for applicability. Therefore we apply di and 
conclude -151(a). 

Now consider aV The set M 1 of relevant models at 
this point is a subset of Ai; it contains only those 
models in A4 that satisfy our new conclusion ->Si(a) 
as well. We have eliminated the models in which Si (a) 
is true. Since Si(x) D -^(a;), and Si(x) D B(x), all 
the models eliminated satisfy B(a), and none satisfies 
52(a). Thus, in M! , the number of models satisfy- 
ing 52(a) is the same as in M.. However, the number 
of models satisfying B(a) is lower in M! as a result 
of the addition of ->Si (a) . This gives rise to a higher- 
proportion of models satisfying 52(a) in M! (S' 2 ) than 
in M (62)- If S' 2 < e*, d 2 is still above threshold after 
the application of d\, and we can apply it to obtain 
-152(a). Otherwise, d 2 is below threshold, and we can- 
not apply it even though it was above threshold before 
the application of di . 

After each step of applying a rule, the set of relevant 
models shrinks, and the proportion of Si(a) of any 
unapplied rule di increases. After a number of steps, 



all the remaining rules would be below threshold, and 
we thus obtain an extension containing only a portion 
of the conclusions that would otherwise be present in 
the non-thresholding version of the extension. □ 

Note that the size of e* determines how much risk is 
tolerated in an extension. The higher the e* , the more 
of the rules can be applied and the longer they can 
stay above threshold. Reiter's non-thresholding ver- 
sion corresponds to the case when e* = 1; that is, 
every rule whose associated proportion is above is 
allowed, and logical consistency alone determines the 
rule's admissibility. 

4 Concluding Remarks 

We have developed a notion of validity for default in- 
ference based on model proportions. A rule is 8- valid 
if the proportion of models in which the consequent of 
the rule is satisfied is greater than 1 — S in the rele- 
vant models picked out by the background knowledge, 
the evidence, and the applicability conditions of the 
default rule. Given a body of background knowledge 
K, we can systematically generate candidate default 
rules and determine which ones are J-valid based on 
the statistical facts known in K. Conflicts between de- 
fault rules stemming from multiple inheritance are re- 
solved as a consequence of the validation process. The 
result is a set of 8- valid default rules which are "pre- 
compiled" for a given background knowledge base, 
and can be reused for different evidence sets without 
change. 

This idea of evaluating the validity of a default rule 
using model proportions is extended to extensions gen- 
erated by a combinations of rules. The compound ef- 
fect is regulated by a sequential thresholding process, 
which blocks the rules whose associated model propor- 
tions with respect to the "current" (shrinking) set of 
models fall below a particular comfort threshold. This 
allows us to use a more reasonable "closure condition" 
for extension than the usual maximal logical strength: 
we can refrain from applying rules that would make 
the extension satisfiable in only a small set of models, 
even if the consequent of the rule is logically consistent 
with the extension. 

Grounding the justification of default rules in model 
proportions provides a way to validate the rules em- 
pirically, and is a first step towards automating the 
learning of default rules from (statistical) data. One 
might ask why we need the default rules when we can 
reason with the statistics directly. Default rules pro- 
vide a succinct and more understandable characteri- 
zation of the import of the data, as well as a smooth 



articulation of the information that may exist in the 
knowledge base. 
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